<!DOCTYPE html>
<html>
	<head>
		<meta charset = "utf-8">
		<title>Ghidra Filesystem Storage</title>
		<style media="screen" type="text/css">
		    pre { margin-left: 120px; }
			li { margin-left: 90px; font-family:times new roman; font-size:14pt; }
			h1 { color:#000080; font-family:times new roman; font-size:36pt; font-style:italic; font-weight:bold; text-align:center; }
			h2 { margin: 10px; margin-top: 40px; color:#984c4c; font-family:times new roman; font-size:18pt; font-weight:bold; }
			h3 { margin-left: 37px; margin-top: 20px; color:#0000ff; font-family:times new roman; font-size:14pt; font-weight:bold; }
			h4 { margin-left: 70px; margin-top: 10px; font-family:times new roman; font-size:14pt; font-weight:normal; }
			h5 { margin-left: 70px; margin-top: 10px; font-family:times new roman; font-size:14pt; font-weight:normal; }
			h6 { color:#000080; font-family:times new roman; font-size:18pt; font-style:italic; font-weight:bold; text-align:center; }
			p { margin-left: 90px; font-family:times new roman; font-size:14pt; }
			body {
				counter-reset: section;
			}
			h2 {
				counter-reset: topic;
			}
			h3 {
				counter-reset: rule;
			}
			h2::before {
				counter-increment: section;
				content: counter(section) ": ";
			}
			h3::before {
				counter-increment: topic;
				content: counter(section) "." counter(topic) " ";
			}
			h4::before {
				counter-increment: rule;
				content: counter(section) "." counter(topic) "." counter(rule) " ";
			}

			div.info {
				margin-left: 10px; margin-top: 80px; font-family:times new roman; font-size:18pt; font-weight:bold;
			}

		</style>
	</head>
	<body>
		<H1>Ghidra Filesystem Storage</H1>
		
		<H2>Introduction</H2>
		
		<P>The purpose of this document is to provide technical details of the file storage scheme(s) employed by 
		both Ghidra projects and Ghidra Server repositories.  As of this writing Ghidra has employed two
		different local filesystem storage schemes:</P>
		
		<UL>
			<LI>Indexed Filesystem (current)</LI>
			<LI>Mangled Filesystem (legacy, pre-9.0 PUBLIC release)</LI>
		</UL>
		
		<P>In addition to providing details of each storage scheme, some details are provided about how project/repository database
		files are stored within each filesystem as well as some troubleshooting tips to aid in any manual interventions
		which may be required.</P>
		
		<P>At this point in time all filesystem implementations rely on the use of a property file for each project defined 
		file (<I>*.prp</I>). For all database type project files there will be a corresponding subdirectory (<I>~*.db</I>)
		which is used to store all content related to a project file database (e.g., ProgramDB).  The naming and organization
		of these two differ significantly between the two filesystem implementations.
		
		<H3>Name Mangling/Encoding</H3>
		
		<P>Ghidra uses the following name mangling/encoding for both Ghidra Server project repository subdirectories as well as 
		project folder and file naming within the legacy Mangled Filesystem.  The goal of this encoding scheme is to preserve case-sensitive
		naming while allowing storage on a single-case or case-insensitive native filesystem.  This is achieved by mutating the original
		name with the following character substitutions:</P>
		
		<UL>
			<LI>All uppercase letters replaced with an underscore ('_') followed by the lowercase letter, and</LI>
			<LI>All underscore ('_') characters are replaced with two underscores ("__")
		</UL>
		
		<P>For a Ghidra Server repository named "My_Project", the resulting filesystem storage folder named "_my___project" 
		would appear within the server repositories directory.</P>
		
		<H3>Ghidra Project Storage</H3>
		
		<P>Ghidra projects employ multiple filesystem storage directories within the top-level project directory (<I>*.rep</I>):</P>
		
		<UL>
			<LI>Local project file storage (<I>idata</I>) provides the primary storage area for non-versioned project files
			and for versioned files checked-out by the user.</LI>
			<LI>Versioned project file storage (<I>versioned</I>) is present with private non-shared projects only and provides
			version control storage similar to a Ghidra Server repository.</LI>
			<LI>User data storage (<I>user</I>) provides non-versioned storage of user-specific data associated with a specific project file.</LI>
		</UL>
		
		<H3>Ghidra Server Storage</H3>
		
		<P>The Ghidra Server employs a separate filesystem storage directory for each project repository using a mangled 
		name (see <I>Name Mangling/Encoding</I> above).  While all newly created project repositories will use the latest
		Indexed Filesystem storage scheme Ghidra continues to support the legacy Mangled Filesystem which may be in use
		by older Ghidra Server installations.  The <I>svrAdmin</I> command provides the ability to migrate an older
		Mangled Filesystem to the current Index Filesystem (see <I>server/serverREADME.html</I>).</P>
		
		<H2>Indexed Filesystem (current)</H2>
		
		<P>This filesystem overcomes the project file-path length limitations inherent to the legacy Mangled Filesystem and
		utilizes an index file to store project file-paths and the corresponding 8-digit hexadecimal identifier for
		each
		(e.g., <I>00001234.prp</I> / <I>~00001234.db</I>).  The following files are used to manage the filesystem content
		and are located at the root of the filesystem storage directory.</P>
		
		<UL>
			<LI>Index File (<I>~index.dat<I>) - current index file
			<LI>Backup Index File (<I>~index.bak</I>) - backup of current index file</LI>
			<LI>Temporary Index File (<I>~index.tmp</I>) - temporary index file created during update</LI>
			<LI>Index Journal File (<I>~journal.dat</I>) - journal or recent changes not yet reflected by index</LI>
			<LI>Backup Journal File (<I>~journal.bak</I>) - backup of journal file created during incremental update</LI>
			<LI>Index Lock File (<I>~index.lock</I>) - index lock used to coordinate index access and modification</LI>
		</UL>
		
		<P><B>Index Rebuild</B> - If the index file becomes corrupt it may be easily rebuilt while the associated project 
		is closed or Ghidra Server stopped to avoid file access during the repair process.  While the filesystem is not 
		active/in-use all of the index
		related files mentioned above may be manually deleted from the root of the appropriate filesystem storage directory
		(see Ghidra Project Storage and Ghidra Server Storage above).  Once the filesystem store is 
		started (e.g., project opened or Ghidra Server started) the missing index will trigger an automatic rebuild of the 
		index based upon the details provided by each property file contained within (<I>*.prp</I>).
		
		<P><B>Locating Project File Storage</B> - Locating individual project files on disk requires interpretation of
		the index file (<I>~index.dat</I>) and traversing the numbered storage folders appropriately.  When locating a project
		file within the index it is important to know both the full Ghidra project directory path and project filename.
		If the project filename is unique you can simply search for the filename within the index, otherwise you will have 
		to search for project folder path first.  Sample <I>~index.dat</I> file:</P>
		
		<PRE>	
VERSION=1
/
  00000003:myFile:a701ee4b1c71321380792951888
/A
  00000100:anotherFile:a701ee4920f909328022843906
  00000105:yetAnotherFile:a701ee48743913628104779815
/A/B
  00001234:myFile:a701ee48019210920546276045
  00001200:myFile.1:a701ee48491297248400412904
/A/B/C
  00000004:someFile:a701ee4a23590324427866017
NEXT-ID:1250
MD5:d41d8cd98f00b204e9800998ecf8427e
		</PRE>
		
		<P>Once the project file of interest has been located within the index file and the corresponding 8-digit hex file number identified,
		the storage subfolder name is derived from the 2nd and 3rd digits of the file number.  Example file number "00001234" will be contained
		with the subfolder "12".  Within this subfolder the "00001234.prp" file should exist as will all other numbered files which have the 
		same 2nd/3rd digits.  The storage hierarchy for the above index file would have the following hierarchy:</P>
		
		<PRE>
./~index.dat
./00/
	00000003.prp
	~00000003.db/
	00000004.prp
	~00000004.db/
./01/
	00000100.prp
	~00000100.db/
	00000105.prp
	~00000105.db/
./12/
	00001200.prp
	~00001200.db/
	00001234.prp      <-- /A/B/myFile property file
	~00001234.db/     <-- /A/B/myFile database directory
		</PRE>
	
		<H2>Mangled Filesystem (legacy)</H2>
		
		<P>This filesystem utilizes mangled naming for all project folders and files and follows the same hierarchy as the project.  
		For example, a project file with the path <I>"/A_1/B_1/myFile"</I> would be found stored as 
		<I>"./_a__1/_b__1/my_file.prp"</I> and <I>"./_a__1/_b__1/~my_file.db/"</I>.  Due to file path-length limitations of native 
		filesystems the use of this storage scheme is no longer used by default and has been retained only for backward 
		compatibility with older projects and repositories.
		
		<H2>Removal of Corrupt Files</H2>
		
		<P>If a project or Ghidra Server repository contains a corrupt file it may not be possible to remove the file via the 
		Ghidra GUI or API.  While a detailed triage of a corrupt file may be possible by the Ghidra Development Team, such files
		may need to be removed after being copied for triage. For a shared repository this will require stopping the Ghidra Server
		and digging into the appropriate named repository directory.  For a local project simply ensure that the project is
		not in use.</P>
		
		<P>For the local project case it will be neccessary to isolate the storage issue since it could be caused by the
		local project store (<I>*.rep/idata/</I>) or versioned repository (Ghidra Server or private non-shared (<I>*.rep/versioned/</I>).
		The Ghidra Server case can easily be identified by creating another temporary shared project to the same shared repository 
		and check the behavior of the project file in question.  If the same behavior is observed the issue is likely on the server.
		If you need assistance identifying the source of the bad behavior or recommended resolution please submit a Ghidra 
		trouble ticket.<P> 
		
		<P>As discussed for each filesystem above, the specific <I>*.prp</I> file and <I>~*.db/</I> directory should be 
		identified and copied for triage.  Keeping a copy will enable triage and may enable restoring the file in the 
		future if possible.  Once this file and corresponding directory have been copied they may be removed from the filesystem.
		For the indexed filesystem case the index related files can be deleted which will trigger a rebuild of the index
		(see Indexed Filesystem above).

	</body>
</html>
